Semantic-Guided Selective Representation for Image Captioning
نویسندگان
چکیده
Grid-based features have been proven to be as effective region-based in multi-modal tasks such visual question answering. However, its application image captioning encounters two main issues, namely, noisy and fragmented semantics. In this paper, we propose a novel feature selection scheme, with Relation-Aware Selection (RAS) Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance salient regions channels, suppress less important ones. addition, process is guided by FSG, which uses fine-grained semantic knowledge supervise process. Experimental results MS COCO show proposed RAS-FSG scheme achieves state-of-the-art performance both off-line on-line testing, i.e., 134.3 CIDEr for testing 135.4 of MSCOCO. Extensive ablation studies visualizations also validate effectiveness our scheme.
منابع مشابه
Text-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملA Distributed Representation Based Query Expansion Approach for Image Captioning
In Figure 1, we present more example results obtained with our approach on the benchmark datasets Flickr8K (Hodosh et al., 2013), Flickr30K (Young et al., 2014), MS COCO (Lin et al., 2014). We also provide groundtruth human descriptions for comparison. There are some cases where our approach falls short. In some of those cases, although the system does not produce the most desirable results, it...
متن کاملGuided Open Vocabulary Image Captioning with Constrained Beam Search
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-tr...
متن کاملContrastive Learning for Image Captioning
Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...
متن کاملStack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3243952